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Abstract 

Kalman filtering and smoothing algorithms are used in many areas, including 
tracking and navigation, medical applications, and financial trend filtering. One of 
the basic assumptions required to apply the Kalman smoothing framework is that 
error covariance matrices are known and given. In this paper, we study a general 
class of inference problems where covariance matrices can depend functionally on 
unknown parameters. In the Kalman framework, this allows modeling situations 
where covariance matrices may depend functionally on the state sequence being 
estimated. We present an extended formulation and novel algorithm for inference 
in this context, specify these to the Kalman smoothing case, and develop an ef- 
ficient implementation that preserves the computational efficiency of the Kalman 
smoother. The method is illustrated with a synthetic numerical example. 



1 Introduction 

The Kalman filter lfl4l and smoother ifTTl are efficient algorithms to estimate the state of a dy- 
namic system given noisy measurements. Over the last 10 years, the optimization perspective on the 
smoothing problem has produced many extensions to dynamic system estimation, including methods 
for smoothing systems with nonlinear process and measurement models Q, systems with nonlinear 
inequality constraints [9], robust Kalman smoothing 12J|4][3l, and smoothing of sparse systems (TJ. 

In all of the above extensions, the variances of process and measurement errors are assumed to be 
fixed and known. In practice, these quantities are often not known, and may in fact depend on the 
state. For example, radar position errors are known to depend on the aspect angle as well as the 
position of the target ifTHl . In some applications [15|, it may be of interest to do Kalman filtering 
in polar coordinates or other coordinates that induce a state dependence in measurement errors. 
Modeling of process error covariance may also be state dependent — for example, Bar-Shalom [6] 
suggests that the right choice of process noise level for flight tracking models depends on the turn 
rate range expected. Therefore if we are estimating turn rate as part of the state, the process noise 
level can be modeled as a function of (a portion of) the state. 

These ideas motivate extensions of the standard Kalman smoothing formulation to situations where 
process and measurement variances have known functional dependence on the state. Several such 
extensions have already been considered. In [18|, the Unscented Transform is used to fit models 
with state-dependent matrices acting on observation noise. Linear systems with additive observation 
noise where measurement error variance is a known function of the state are studied in lfT9l . Linear 
systems with control inputs transformed by state-dependent matrices are considered in lfl2l . Finally, 
adaptive system identification, as presented in IfTTl . also falls into this class. 

In this paper, we formulate the state-dependent covariance problem as a statistical estimation prob- 
lem, and develop algorithms for obtaining the maximum a posteriori (MAP) estimate. In the theo- 
retical development, we allow the process and measurement functions to be nonlinear, and we allow 
the functional dependence of covariance on the state to be nonlinear as well. 
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The paper proceeds as follows. In Section [2] we formulate the Kalman model with state-dependent 
covariances and develop the MAP objective to be optimized. While this objective is motivated by 
the Kalman smoother, it can be used for general nonlinear regression where variance depends in a 
known functional way on the parameters. In Section [3] we build a new algorithm for solving the 
resulting optimization problems that exploits their convex composite structure. In Section [4] we 
provide a numerical experiment using simulated data that demonstrates the performance of the new 
smoother and the potential modeling capabilities of the approach. Further algorithmic details for the 
Kalman smoothing problem are explained in the Appendix. 

2 Kalman Smoothing with State-Dependent Uncertainty 

The dynamic structure of the Kalman smoothing problem is specified as follows: 



xi = gi(x ) + wi, 

Xk = 5fc( x k-i) + w k k = 2,...,N, (1) 
z k = /i fc (x k )+v k k = l,...,N , 

where g/., hk are known (nonlinear) process and measurement functions, and w k e W\ v k e R m (k) 
are mutually independent Gaussian random variables with positive definite covariance matrices Qf. 
and Rj., x k 6 W l are the unknown states, and z k 6 R m ( fe ) are the observed measurements. We 
remove the assumption that Qk and R). are fixed and known, and instead model these covariance 
matrices as C 2 functions of the state. 

In order to develop a simpler notation for estimating the entire state sequence, we define functions 
g : R nN -> R nN and h : W lN -> R M , with M = J^k m ^ from components g k and h k as follows: 



9{x) 







' hi(xi) ' 


%2 - 32(^l) 


, h(x) = 








xn - 9n{xn-i). 




h N (x N )_ 



(2) 



Given a sequence of column vectors {uk} and matrices {T^} we use the following notation: 

R = diag({i4}) 
Q = diag({g fc }) 
x = vec({x k }) 
w = vec({# ,0, 
z = vec({zi, . . 



vec({u k }) = 



Ui 

u 2 

.UN. 



diag({T fc }) = 



Tx 


.0 





T 2 













T N . 



(3) 



•,0}) 
zn}) 



where 50 ■= 9x(xq). 



With this notation, and under Gaussian assumptions, the MAP object for the Kalman smoother is 
given by 

-logdet (Q-^O^-logdet (i?- 1/2 (x))+i||Q- 1 / 2 (x)( 5 (x)- w )||2 + i||i?-V2 (;r)(Ma;) _ z) ||2 

(4) 



3 Convex Composite Formulation 

We apply the Gauss-Newton methodology for minimizing convex composite functions [ 1 1 to the 
objective Q. The first step in this process is to write this objective in convex composite form, that 
is, in the form / = p o F, where p is convex and F is smooth. The choice of the functions p and F 
depend on how we wish to model the representation of the problem. The most straightforward way 
to rewrite Q is the more general form 

J{x) := ] 2 c{x) T W{x)- l c{x) + ilogdetO^Or)), 

where c : R nN R M + nN and W : W lN -> 5++ +nAr are smooth, and S^ nN is the cone of real 
symmetric (M + nN) x (M + nN) positive definite matrices. Then J — po F with 

p( Cl W) := ^(FW^c + i logdet(PF) and F{x) = {c(x), W{x)). 
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Although the function F in this formulation can be assumed smooth, the function p is not convex. 
Indeed, p is the difference of two convex functions. When viewed as a function of (c, W~ x ), it is still 
not jointly convex in these arguments. Hence, in order to obtain a convex composite formulation, a 
different approach must be taken. 

Here, we propose an approach that applies in many practical settings and yields an efficient solution 
procedure. However, a price is paid in a more complex model for the covariance matrices. Specif- 
ically, we assume that the Cholesky factors for Q7 (xk) and R^ 1 {xk) are given to us as explicit 

functions of the state. We denote these factors by Q, 1 ^ 2 (xk) and R k 1 ^ 2 (xk), respectively. In some 
settings, the matrices Qk{xk) and Rk(xk) are modeled as diagonal matrices, in which case the in- 
verse Cholesky factors are easily computed diagonal matrices. We provide an example of this type 
in the final section. Under this modeling assumption, the objective Q can be abstracted to the more 
general form 

K{x) = -c{x) T V{x) T V{x)c{x) - log o det[V(x)} , (5) 

where c : R nN -> R M + nN and V : R nN -> C+£ nN are smooth, and C^+ nN is the cone of 
(M + nN) x (M + nN) real lower triangular matrices with strictly positive entries on the diagonal. 
Then K can be written in convex composite form K(x) = poF with 

p(u,v) = ^U T u- ^log[uj] ; F(x) 



~Fi{x) 




V(x)c(x) 


F 2 (x)_ 




vec[{V u (x)}}_ 



(6) 



The direction finding subproblem in a Gauss-Newton method takes the form 

min p(F(x) +F'(x)d) . 

d 

Linearizing the functions Fi(x) in (|6) yields approximations Fi(x; d) := Fi(x) + Fi(x)d, which in 
turn gives the approximation 

K(x;d)=p[F 1 (x;d),F 2 (x;d)}. (7) 
This is the objective for the direction finding subproblem. Here 

F 1 (x;d) = V(x)c(x) + (V{x)d x c{x) + (c(x) T ®I N )d x V(x))d 

F 2 {x-d) = vec{{V H (x)+d x Vu{x)d}). { > 

Note that we must be sure that F 2 (x; d) is component- wise greater than zero. For details of these 
derivations, see [2]. The Gauss-Newton subproblem is now given by 

min \h{x- d) T F x {x; d) - V log[F 2 (a;; d)} . (9) 

If we ignore dependence on x, this problem can be rewritten as 

min \d T Cd + a T d — log[sj] s.t. s = vec{Vu} + d x vec{Vu}d, (10) 

d£R nN 2 L — ' 
i 

where 

C = {Vd x c+(c T ®I N )d x V} T lVd x c+(c T ® I N )d x V], 
a = c T V T Vd x c+c T V T (c T ®I N )d x V. {n) 
Note that a is the gradient of the quadratic portion of the extended objective with respect to the state 
sequence x. The quantity (c T <£> lN)d x V that appears in ( fTT) can be rewritten as 

{c T ®i N )d x v - zZ?JT N aWi- ( 12 ) 

The Lagrangian associated with this problem is 

L(d, s, A) = ^d T Cd + a T d-J2 lo g[ s «] + A T (s - vec{V iz } - d x vec{V tl }d) . (13) 

i 

The corresponding optimality conditions are 

W d L = Cd + a- d x vec{Vti} T X = 

V S L = -D(s)- 1 l + \ = (14) 

V A £ = s - vec{V u } - d x vec{Vu}d = . 
To solve the direction finding subproblem, we apply a damped Newton method directly to the opti- 
mality conditions. The details are given in section |5.1| of the Appendix. Explicit computations for 



the Kalman smoothing case are given in section 5.2 of the Appendix 
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Figure 1: True state x% (black curve), Extended Smoother estimate (thick red dash-dot), Kalman 
filter estimate (blue dash-dot) and Kalman Smoother estimate (green dashed curve). Measurements 
are displayed as diamonds, and those outside the axis range are displayed on the figure boundary. 



4 Numerical Results 



In this section, we present some numerical experiments to show the advantages and modeling pos- 
sibilities of the new Kalman smoother. The simulation model we consider is similar to the one 
presented in [9 |. The 'ground truth' time series for this simulated example is given by 



'(*) = 



1 - 2 cos(t) 
t - 2sin(t) 



The time between measurements is a constant denoted by At. The models for the mean of Xk given 
Xk-i and for process covariance Qk lTP3l[T6l are 

[At Ai 2 /2" 
Xk-i , Qk - At 2 /2 At 3 /3 • 



gk(x k -i) = 



l 

At 



The measurement model for the mean of zj, given x^ is h^{xk) 
second component of Xk- 



= %2 k , where x 2 .fc denotes the 



The main innovation of the example is in the measurement variance model. The smoother takes 
inverse Cholesky factors as input, and these are assumed to be 3 — xi.fc. Then the variance model is 
given by Rk(xk) = (3 — xi,/c)~ 2 . The measurements were generated using the measurement model, 
from two full periods of the time series x(t), with N = 100 discrete time points equally spaced over 
the interval [0, 4ir], and with noise sampled from iV(0, Rk(xk))- Since the true state x 1 varies in the 
interval [—1,3], the variance for the observations goes to infinity when t is a multiple of ir. 

This simulation illustrates a situation where the measurements are very reliable for some state values, 
but completely unpredictable for others. This phenomenon may occur for example if sensors report 
garbage values when the attitude of a vehicle is in a particular configuration. The measurement 
model presented here can be easily adapted by the user to take their beliefs about the system into 
account. The main point is that as long as the inverse Cholesky factors for the variance can be coded 
as a smooth function of the state, smoothed estimates for state values can be obtained taking into 
account this bad behavior of the measurements. 

The result of the simulation is shown in Figure [T] The extended Kalman smoother (thick red 
dash-dot) is able to recover the ground truth (shown in black) with no appreciable difference. The 
Kalman filter (thin blue dash-dot) is strongly affected by the outlying measurements, as expected. 
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The Kalman smoother (green dashed) is able to smooth the measurements, but cannot pick up the 
oscillations of the ground truth, which are small in magnitude compared to the size of the errors. 

This last point is the most important — it is not just the magnitude of the outliers that makes the 
Kalman smoother fail, although it can be seen to be rather far off the ground truth. The biggest 
challenge of the situation presented is knowing which measurements to trust, since this information 
depends on the state being estimated. 
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5 Appendix 



5.1 Newton Iterations 



E(s,X,d) = 



s - vec{Vu} - d x vec{Vu}d 
D(s)D(X)l - 1 
Cd + a - d x vec{V u } T X 



VE(s,X,d) = 
To get the search direction for ( fT0] >, we solve the equation 



I 

D(X) 




-d x vec{Vu} 
D(s) 
d x vec{V u } T C 



VE(s,X,d) 



As' 
AX 
Ad 



-E(s,X,d). 



(15) 



(16) 



(17) 



Using row operations R 2 — R 2 — D(X)Ri and R 3 = i? 3 + d x vec{Va} T D (s) 1 R 2 , we obtain the 
modified system 



where 



a 



I — d x vec{Vu} 

D(s) D{X)d x vec{V u } 

C + d x vec{V ll } T D( S )' 1 D(X)d x vec{Vu} 



-s + vec{Vu} + d x vec{V H }d 
1 - D(X) (vec{^} + a a ,vec{VJ i }d) 





"As" 




"of 




AA 








Ad 




.7- 



(18) 



7 = -Cd - a + d x vec{V u } T (A + D(s)" 1 (1 - D(A)(vec{V«} + a«vec{K<}d))) 
From here, we can recover the direction: 

Ad = (C + d x vec{V u } T D(s)' 1 D(X)d x vec{Vu}y\ 
AA = i9( s )- 1 (l-C(A)(vec{^} + 5 x vec{^}(d + Ad)) 
As = -s + vec{V u } + d x vec{V u }(d + Ad) 



(19) 



Note that any damping scheme will require that we have s > for the objective to be finite, and 
hence we will also have A > 0. 

5.2 Structure of the Extended Kalman Smoothing Objective 

We now specify the method in the previous section to the Kalman smoothing problem, to demon- 
strate that the computational efficiency of the Kalman smoother can be preserved. 

Define the matrix function V(x) by 

V(x) = diag ({Q- 1/2 (x),R- 1/2 (x)}j , (20) 
with Q and R defined in Q. We also define the vector function c(x) by 



c(x) 



g(x) - w 
h{x) — z 



(21) 



with g and h as in 

With these definitions, objective |5]l is exactly and can be written explicitly as follows: 

K{x) = l(c T (x)V(x) T V(x)c(xfj -]ogodet[V(a:)] 

= \Y,k=\\- z k ~ h k (x k )} T R^ T/2 (x k )R^ 1/2 (x k )[z k - h k (x k )} 

+ \ J2k=oi x k - 9k{x k -i)\ rT Q k T/2 {x k )Q' k 1/2 (x k )[x k - g k (xk-i)] 
-\ogdet{R- 1/2 (x k j) - \ogdet{Q- 1/2 (x k )). 



(22) 
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We now derive the explicit forms for C and a in ( 1 1 1 for the Gauss-Newton subproblem ( 10 1. Recall 
that C and a are given by 

C = [Vd x c+(c T ®I N )d x V} T [Vd x c+(c T (g>I N )d x V], 
a = c T V T Vd x c + c T V T {c T ®I N )d x V. 

where 

(c T ®i N )d x v = zZt nN *d*Vi.. 

Define w(x) and v(x) in ^ by 

w k (x) = x k -gk{xk-i) 

v k (x) = z k -h k (x k ) 
In the expressions below, we will use notation W).,i to mean the ith component of Wk- 
The matrix (c T ® In)8 x V has the following block structure: 



(23) 
(24) 



[(c T ® / w )fl«V] 



"Qi 
















" 


i?i 






















Qi 









































<2a 



















R 3 











































Qn-i 


















Rn-i 





















Qn 


_ 















Rn. 



(25) 



where 



Qi = E 



Ri 



Then we can write down the gradient a: 



-1/2, 



3=1 
m(z) 



r t t 

a = laf 



4 N J 



(26) 



(27) 



where 

+-jQ7 T/2 Q,+i)7i?7 T/2 ^. (28) 



We can compute C in a similar manner. First we write down the explicit form of [Vd x c + (c T 
I N )d x V}: 



-1/2 

-i?7 1/2 V/n + i?i 

o 



-Q2 1/2 V<7 2 






Q 2 " 1/2 + Q 2 

Ri 1/2 \7h2 + R 2 

-03 1/2 Vff3 








Qn-i + Qjv- 

-1/2^ 
"•JV-1 

-Qn-i Vpjv 















Qjv 1//2 + 



-1/2 



(29) 
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Multiplying the transpose of this matrix by itself, obtain the final form of C: 



C = 



Ck — 



A k = 



d 
A 2 C 2 AJ 







A N C N . 

k 



[Q- k 1/2 + Q k HQ- 1/2 



(30) 



-Rl 1/2 Vh k 



RkV\-Rk 1/2 Vh 



iQk 1/2 + Q k ) T Qk 1/2 Vg k 



Qk] + ^gI +1 Qkli^9k+i 

k "t- Rk] 



Remark 5.1 The matrix {30 \ is block tridiagonal, and so it can be inverted with effort 0(n 3 N) 
using the algorithm in [8]. 

Recall the direction finding equation ( p"9| ) in the Extended GN algorithm: 



VF(s,X,d) 

The solution to this system is given by fl9| ): 



As' 
AA 
Ad 



-F(s,X,d). 



where 



Ad = (C + d x vec{V u } T D( s y 1 D{X)d x vec{V u }) 1 */ 
AX = D(s)- 1 [1 - D(X)(vec{V u } + d x vec{V u }{d + Ad))} 
As = -s + vec{V ii } + d x vec{V li }{d + Ad), 



[d x vec{V u } T D(s)- 1 D(X)d x vec{V ii }]k = 

d x(k) diag{Q~ 1/2 } T D{s Qk )- 1 D(X Qk )d x{k )diiig{Q' k 1/2 } 
+d x{k) di&g{Rl 1/2 } T D{s Rk y 1 D{X Rk )d x{k) dmg{R k 1/2 } 
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